Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from Pybind11 to nanobind #345

Merged
merged 44 commits into from
Aug 30, 2024

Conversation

ManifoldFR
Copy link
Member

@ManifoldFR ManifoldFR commented Aug 5, 2024

As discussed in #340, this PR sets us up with nanobind as the bindings support backend, instead of pybind11 (as a submodule, just as we used pybind11).

It does the necessary changes:

  • in the CMake listfile
  • in the bindings/python files as required, changing the pybind11 headers, including the relevant nanobind headers (note that the pybind11 stl header was split up in several headers in nanobind, namely for maps, vectors, and strings) and the workspace pybind11:: to nanobind::

Work left to do

Notes

  • bindings for the library now require a C++17-compliant compiler
  • nanobind::arg_v is not the same thing as it used to be in pybind11. Firstly, default arguments are specified differently (using an assignment expression, just as in Boost.Python). Secondly, function argument docstrings seem to be gone. This means that a bunch of documentation for class ctor and function arguments is gone. I'm not sure where to recover them?
  • I edited the workflow files (except the changelog check) to make them ignore updates to the Changelog

Closes #340

@hrp2-14
Copy link
Member

hrp2-14 commented Aug 5, 2024

Hi ! This project doesn't usually accept pull requests on the main branch.
If this wasn't intentionnal, you can change the base branch of this PR to devel
(No need to close it for that). Best, a bot.

@ManifoldFR ManifoldFR changed the base branch from main to devel August 5, 2024 18:17
@jcarpent jcarpent marked this pull request as draft August 7, 2024 08:12
@ManifoldFR ManifoldFR force-pushed the topic/nanobind branch 3 times, most recently from 7b780d9 to 62b9ea4 Compare August 7, 2024 10:10
@ManifoldFR
Copy link
Member Author

rebased on devel after #342

@ManifoldFR
Copy link
Member Author

Seems that on macOS CI, CMake is picking up Python 3.12 from the homebrew instead of the prescribed conda Python 3.10

@ManifoldFR ManifoldFR force-pushed the topic/nanobind branch 5 times, most recently from 609c415 to d4b4f6e Compare August 8, 2024 15:32
@ManifoldFR
Copy link
Member Author

ManifoldFR commented Aug 9, 2024

Most tests are good.
The stragglers are Windows and a Julia test on Ubuntu ?

@fabinsch
Copy link
Collaborator

fabinsch commented Aug 9, 2024

Nice @ManifoldFR good job ! The Julia example shows how you can use the Python package proxsuite in Julia code. and from the failing test I can see here that there is a problem with the get method now (no method matching get(::Matrix{Float64}, ::PyObject)).

You said that this is handled differently now in nanobind vs how it was done before in pybind.

For windows, we have issues with finding and linking against the Python::Module.

@ManifoldFR
Copy link
Member Author

Nice @ManifoldFR good job ! The Julia example shows how you can use the Python package proxsuite in Julia code. and from the failing test I can see here that there is a problem with the get method now (no method matching get(::Matrix{Float64}, ::PyObject)).

You said that this is handled differently now in nanobind vs how it was done before in pybind.

I'm not sure what's happening there. It's all numpy stuff on this line so it shouldn't change?

For windows, we have issues with finding and linking against the Python::Module.

I've discussed the issues with the module finding on this issue: jrl-umi3218/jrl-cmakemodules#708
Nanobind uses the modern, unified FindPython module (which checks for Python 3 or 2, and creates the Python::Module target) instead of FindPython3 which we use and creates Python3::Module.
This usually didn't matter for us because we were never using the Python*::Module modern CMake targets. But Nanobind actually expects Python::Module to exist.

@ManifoldFR
Copy link
Member Author

Actually I'm getting the Julia error locally, too. The same one. I don't know what's wrong...

@ManifoldFR
Copy link
Member Author

ManifoldFR commented Aug 10, 2024

Okay I looked into the Julia documentation for arrays to see what was going on. In the Julia REPL I typed the following

julia> B = ones(2,10)
2×10 Matrix{Float64}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

julia> strides(B)
(1, 2)

The stride might look strange, but it's not:

  1. Julia expresses strides in underlying data type, Float64 here, white Numpy expresses them in bytes
  2. For a C-ordered AKA row-major array (B.flags.c_contiguous returns True) in Python/Numpy, the stride is (80, 8) i.e. (10, 1) in float64s.
    A stride of (1,2) is actually an indication that you have a column-major or Fortran-style array, which is actually the order used in Julia.

What's going on in the failing example here is that, whatever you do, calling, Numpy arrays are converted to Julia (column-major arrays).

The behaviour we're getting there is actually the same that made other tests and examples fail:

  • Scipy sparse matrices sometimes convert to column-major instead of numpy's default row-major when calling .toarray())
  • nanobind does not like converting to Eigen::Ref<const Matrix<T, N, M, RowMajor>> (e.g. proxsuite::proxqp::dense::MatRef<T>) from column-major arrays, due to the incompatible memory layouts. It doesn't perform a copy in this case for some reason, although the docs say they should (though this is for the non-const case?)

A workaround is this: the point of the nanobind::DRef<> template is that it uses a two-way dynamic (inner and outer) stride and can process whatever data you throw at it, just like nanobind's docs say. The template default for Eigen::Ref uses a non-dynamic inner stride (equal to one).

Or we could figure out why nanobind is not copying the data in this specific case. (the version of nanobind we currently use is the 2.0.0 release)

@ManifoldFR
Copy link
Member Author

I applied your comments @fabinsch. I also excluded the CHANGELOG from CI.

fabinsch
fabinsch previously approved these changes Aug 28, 2024
@ManifoldFR
Copy link
Member Author

We can merge @jcarpent, I'll let you do the honors 😃

Copy link
Member

@nim65s nim65s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not compiling with nanobind v2.1.0:

In file included from /…/proxsuite/bindings/python/src/expose-results.hpp:9,
                 from /…/proxsuite/bindings/python/src/algorithms.hpp:8,
                 from /…/proxsuite/bindings/python/src/expose-all.cpp:11:
/…/proxsuite/bindings/python/src/optional-eigen-fix.hpp: At global scope:
/…/proxsuite/bindings/python/src/optional-eigen-fix.hpp:17:20: error: expected template-name before '<' token
   17 |   : optional_caster<std::optional<Eigen::Ref<const T>>>
      |                    ^

The nanobind submodule is on 8a65e0. On that commit it is working, but is it really necessary to use 3 days old unstable version ?
v2.1.0 is 3 week old, and only 25 commits away from the current master.

@ManifoldFR
Copy link
Member Author

This is not compiling with nanobind v2.1.0:

In file included from /…/proxsuite/bindings/python/src/expose-results.hpp:9,
                 from /…/proxsuite/bindings/python/src/algorithms.hpp:8,
                 from /…/proxsuite/bindings/python/src/expose-all.cpp:11:
/…/proxsuite/bindings/python/src/optional-eigen-fix.hpp: At global scope:
/…/proxsuite/bindings/python/src/optional-eigen-fix.hpp:17:20: error: expected template-name before '<' token
   17 |   : optional_caster<std::optional<Eigen::Ref<const T>>>
      |                    ^

The nanobind submodule is on 8a65e0. On that commit it is working, but is it really necessary to use 3 days old unstable version ? v2.1.0 is 3 week old, and only 25 commits away from the current master.

Excellent catch, I was using the new optional_caster class from which the standard type_caster are derived in the post-v2.1.0 version. I'll try to make this compatible with v2.1.0 !

@ManifoldFR
Copy link
Member Author

ManifoldFR commented Aug 30, 2024

I just pushed two commits which reverts the submodule to v2.1.0 and replicates the behaviour of my specialization of type_caster<optional<Ref<const T>>> @nim65s

Copy link
Member

@nim65s nim65s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, now I can confirm that it works for me (with #352):

proxsuite> 100% tests passed, 0 tests failed out of 52

@ManifoldFR
Copy link
Member Author

Thanks, now I can confirm that it works for me (with #352):

proxsuite> 100% tests passed, 0 tests failed out of 52

Great !

Also (cc @fabinsch), what do you suggest we do with the ROS CI ? Just skip it ? It looks like it's whacked

@ManifoldFR ManifoldFR merged commit 5bc2b9d into Simple-Robotics:devel Aug 30, 2024
75 of 77 checks passed
@ManifoldFR ManifoldFR deleted the topic/nanobind branch August 30, 2024 12:33
@fabinsch
Copy link
Collaborator

Also (cc @fabinsch), what do you suggest we do with the ROS CI ? Just skip it ? It looks like it's whacked

I think @wxmerkt is aware of this issue for the TSID downstream as mentioned here. However, I don't know if there is a plan to fix it (soon). If not, I agree that we could deactivate it for now, like this auto-merge will work again here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move from pybind11 to nanobind
4 participants